DigiPath_MLTK functions

Low Level functionality for getting patches per image level

Shows how to call two functions for finding patch locations using 
    package module code imported from .toolkit

    OpenSlide Whole Slide Image file with multiple image levels.

Aperio data from link found here: openslide.org

patch_location_array = get_patch_location_array_for_image_level(run_parameters)

# help(get_patch_location_array_for_image_level)

""" Usage: 
patch_location_array = get_patch_location_array_for_image_level(run_parameters)
                        using patch_select_method, find all upper left corner locations of patches
                        that won't exceed image size givin the 'patch_height' and 'patch_width

Args (run_parameters):  python dict.keys()
                            wsi_filename:           file name (with valid path)
                            patch_height:           patch size = (patch_width, patch_height)
                            patch_width:            patch size = (patch_width, patch_height)
                            thumbnail_divisor:      wsi_image full size divisor to create thumbnail image
                            patch_select_method:    'threshold_rgb2lab' or 'threshold_otsu'
                            threshold:              minimimum sum of thresholded image (default = 0)
                            image_level:            openslide image pyramid level 0,1,2,...
Returns:
    patch_location_array:   [[x, y], [x, y],... ]   n_pairs x 2 numpy array
"""

mask_im, prev_im, patch_array = get_patch_locations_preview_image_for_image_level(run_parameters)

# help(get_patch_locations_preview_image_for_image_level)

""" Usage: with run_parameter dict "run_pars"
mask_image, thumb_preview, patch_location_array = get_patch_locations_preview_image_for_image_level(run_pars)
                                                  create viewable images to show patch locations

Args (run_parameters):  python dict.keys()
                            wsi_filename:           file name (with valid path)
                            patch_height:           patch size = (patch_width, patch_height)
                            patch_width:            patch size = (patch_width, patch_height)
                            thumbnail_divisor:      wsi_image full size divisor to create thumbnail image
                            patch_select_method:    'threshold_rgb2lab' or 'threshold_otsu'
                            threshold:              sum of thresholded image minimimum (default = 0)
                            image_level:            openslide image pyramid level 0,1,2,...

                        Optional keys()
                            border_color:           patch-box representation color red, blue, green, ...

Returns:
    mask_image:             black & white image of the mask
    thumb_preview:          thumbnail image with patch locations marked
    patch_location_array:   list of patch locations used [(row, col), (row, col),... ]
"""

First cell imports some extras for notebook display

In [3]:
import time
note_book_cell_seq_run_time = time.time()
import os
import sys

from pychunklbl.toolkit import get_patch_location_array_for_image_level
from pychunklbl.toolkit import get_patch_locations_preview_image_for_image_level
from pychunklbl.toolkit import get_file_size_ordered_dict, lineprint_level_sizes_dict, get_level_sizes_dict

DEFAULT_THUMBNAIL_DIVISOR = 20

data_dir = '../../../ncsa/DigiPath_MLTK_data/Aperio'

fs_od = get_file_size_ordered_dict( data_dir, file_type_list=['.svs', '.tif', '.tiff'] )
list_number = 0
for file_name, file_size in fs_od.items():
    print('%3i %30s: %i'%(list_number, file_name, file_size) )
    list_number += 1
  0         CMU-1-Small-Region.svs: 1938955
  1               JP2K-33003-1.svs: 63847265
  2           CMU-1-JP2K-33005.svs: 132565343
  3                      CMU-1.svs: 177552579
  4                      CMU-3.svs: 253815723
  5               JP2K-33003-2.svs: 289250433
  6                      CMU-2.svs: 390750635

Next cell defines the dictionary of parameters needed to call these two functions

In [4]:
""" Select a filename from the list of files available in your data directory 
"""
image_file_name = 'CMU-2.svs'
# image_file_name = 'JP2K-33003-2.svs'
# image_file_name = 'CMU-1-Small-Region.svs'

run_parameters = dict()
run_parameters['wsi_filename'] = os.path.join(data_dir, image_file_name)
print('Image File:\n', run_parameters['wsi_filename'])

run_parameters['thumbnail_divisor'] = DEFAULT_THUMBNAIL_DIVISOR
run_parameters['patch_select_method'] = 'threshold_otsu' # 'threshold_rgb2lab'
run_parameters['patch_height'] = 224
run_parameters['patch_width'] = 224
run_parameters['threshold'] = 0
run_parameters['border_color'] = 'blue'

lineprint_level_sizes_dict(run_parameters['wsi_filename'])
Image File:
 ../../../ncsa/DigiPath_MLTK_data/Aperio/CMU-2.svs
 
          image_size:  (78000, 30462)
         level_count:  4
    level_diminsions:  ((78000, 30462), (19500, 7615), (4875, 1903), (2437, 951))
   level_downsamples:  (1.0, 4.000131319763625, 16.003678402522333, 32.01905559532393)
 
In [5]:
level_sizes_dict = get_level_sizes_dict(run_parameters['wsi_filename'])

n_levels = level_sizes_dict['level_count']
for im_lvl in range(0, n_levels):
    run_parameters['image_level'] = im_lvl
    run_parameters['thumbnail_divisor'] = DEFAULT_THUMBNAIL_DIVISOR
    
    t1 = time.time()

    patch_location_array = get_patch_location_array_for_image_level(run_parameters)
    print('\nimage_level = %i,\nnumber of patches = %i'%(im_lvl, len(patch_location_array)))
    
    thmb_div_4_level = run_parameters['thumbnail_divisor'] // (2*run_parameters['image_level'] + 1)
    run_parameters['thumbnail_divisor'] = thmb_div_4_level
    
    mask_im, prev_im, patch_array = get_patch_locations_preview_image_for_image_level(run_parameters)

    print('image_level = %i,\nnumber of patches = %i'%(im_lvl, len(patch_array)))
    print('cell run time: %0.3f'%(time.time() - t1))
    print('thumb image size:', prev_im.size, 'thumbnail_divisor', run_parameters['thumbnail_divisor'])
    
    display(prev_im)
image_level = 0,
number of patches = 14242
image_level = 0,
number of patches = 14242
cell run time: 2.348
thumb image size: (3900, 1522) thumbnail_divisor 20
image_level = 1,
number of patches = 1136
image_level = 1,
number of patches = 1089
cell run time: 1.357
thumb image size: (3250, 1269) thumbnail_divisor 6
image_level = 2,
number of patches = 93
image_level = 2,
number of patches = 105
cell run time: 0.379
thumb image size: (1218, 475) thumbnail_divisor 4
image_level = 3,
number of patches = 31
image_level = 3,
number of patches = 35
cell run time: 0.361
thumb image size: (1218, 475) thumbnail_divisor 2

Help on utility functions used for notebook organization and display:

In [6]:
help(get_level_sizes_dict)
Help on function get_level_sizes_dict in module pychunklbl.toolkit:

get_level_sizes_dict(image_file_name)
    Usage:  level_sizes_dict = get_level_sizes_dict(image_file_name)
            read an openslide image type file to get the pyramid sizes available
    
    Args:
        image_file_name: full path or on path file name of .svs or some openslide format
    
    Returns:
        level_sizes_dict:
                            level_sizes_dict['image_size'] = os_obj.dimensions
                            level_sizes_dict['level_count'] = os_obj.level_count
                            level_sizes_dict["level_downsamples"] = os_obj.level_downsamples
                            level_sizes_dict['level_diminsions'] = os_obj.level_dimensions

In [7]:
help(lineprint_level_sizes_dict)
Help on function lineprint_level_sizes_dict in module pychunklbl.toolkit:

lineprint_level_sizes_dict(image_file_name)
    Usage:  lineprint_level_sizes_dict(image_file_name)
            display the openslide image type file pyramid sizes available
    
    Args:
        image_file_name: full path or on path file name of .svs or some openslide format
    
    Returns:
        None:               (prints)
                            'image_size': os_obj.dimensions
                            'level_count': os_obj.level_count
                            'level_downsamples': os_obj.level_downsamples
                            'level_diminsions': os_obj.level_dimensions

In [8]:
help(get_file_size_ordered_dict)
Help on function get_file_size_ordered_dict in module pychunklbl.toolkit:

get_file_size_ordered_dict(data_dir, file_type_list)
    Usage:  file_size_ordered_dict = get_file_size_ordered_dict
        get size-ranked list of files of type in a directory
    
    Args:
        data_dir:           path to directory
        file_type_list:     file type extensions list (including period) e.g. = ['.svs', '.tif', '.tiff']
    
    Returns:
        ordered_dictionary: file_name: file_size   (ordered by file size)

In [9]:
print('Notebook total run time = %0.3f seconds'%(time.time() - note_book_cell_seq_run_time))
Notebook total run time = 6.168 seconds
In [ ]: